Search CORE

10 research outputs found

Hurricane Forecasting: A Novel Multimodal Machine Learning Framework

Author: Bertsimas Dimitris
Boussioux Léonard
Guénais Théo
Zeng Cynthia
Publication venue
Publication date: 11/06/2021
Field of study

This paper describes a machine learning (ML) framework for tropical cyclone intensity and track forecasting, combining multiple distinct ML techniques and utilizing diverse data sources. Our framework, which we refer to as Hurricast (HURR), is built upon the combination of distinct data processing techniques using gradient-boosted trees and novel encoder-decoder architectures, including CNN, GRU and Transformers components. We propose a deep-feature extractor methodology to mix spatial-temporal data with statistical data efficiently. Our multimodal framework unleashes the potential of making forecasts based on a wide range of data sources, including historical storm data, and visual data such as reanalysis atmospheric images. We evaluate our models with current operational forecasts in North Atlantic and Eastern Pacific basins on 2016-2019 for 24-hour lead time, and show our models consistently outperform statistical-dynamical models and compete with the best dynamical models, while computing forecasts in seconds. Furthermore, the inclusion of Hurricast into an operational forecast consensus model leads to a significant improvement of 5% - 15% over NHC's official forecast, thus highlighting the complementary properties with existing approaches. In summary, our work demonstrates that combining different data sources and distinct machine learning methodologies can lead to superior tropical cyclone forecasting. We hope that this work opens the door for further use of machine learning in meteorological forecasting.Comment: Under revision by the AMS' Weather and Forecasting journa

arXiv.org e-Print Archive

InsectUp: Crowdsourcing Insect Observations to Assess Demographic Shifts and Improve Classification

Author: Boussioux Léonard
Cherti Mehdi
Giro-Larraz Tomás
Guille-Escuret Charles
Kégl Balázs
Publication venue
Publication date: 29/01/2020
Field of study

Insects play such a crucial role in ecosystems that a shift in demography of just a few species can have devastating consequences at environmental, social and economic levels. Despite this, evaluation of insect demography is strongly limited by the difficulty of collecting census data at sufficient scale. We propose a method to gather and leverage observations from bystanders, hikers, and entomology enthusiasts in order to provide researchers with data that could significantly help anticipate and identify environmental threats. Finally, we show that there is indeed interest on both sides for such collaboration.Comment: Appearing at the International Conference on Machine Learning, AI for Social Good Workshop, Long Beach, United States, 2019 Appearing at the International Conference on Computer Vision, AI for Wildlife Conservation Workshop, Seoul, South Korea, 2019 5 pages, 6 figure

arXiv.org e-Print Archive

Holistic Deep Learning

Author: Bertsimas Dimitris
Boussioux Léonard
Carballo Kimberly Villalobos
Li Michael Lingzhi
Paskov Alex
Paskov Ivan
Publication venue
Publication date: 02/11/2022
Field of study

There is much interest in deep learning to solve challenges in applying neural network models in real-world environments. In particular, three areas have received considerable attention: adversarial robustness, parameter sparsity, and output stability. Despite numerous attempts to solve these problems independently, little work simultaneously addresses the challenges. In this paper, we address the problem of constructing holistic deep learning models by proposing a novel formulation that solves these issues in combination. Real-world experiments on both tabular and MNIST datasets show that our formulation can simultaneously improve the accuracy, robustness, stability, and sparsity over traditional deep learning models among many others.Comment: In preparation for Machine Learnin

arXiv.org e-Print Archive

TabText: A Flexible and Contextual Approach to Tabular Data Representation

Author: Bertsimas Dimitris
Boussioux Léonard
Carballo Kimberly Villalobos
Ma Yu
Na Liangyuan
Soenksen Luis R.
Zeng Cynthia
Publication venue
Publication date: 21/07/2023
Field of study

Tabular data is essential for applying machine learning tasks across various industries. However, traditional data processing methods do not fully utilize all the information available in the tables, ignoring important contextual information such as column header descriptions. In addition, pre-processing data into a tabular format can remain a labor-intensive bottleneck in model development. This work introduces TabText, a processing and feature extraction framework that extracts contextual information from tabular data structures. TabText addresses processing difficulties by converting the content into language and utilizing pre-trained large language models (LLMs). We evaluate our framework on nine healthcare prediction tasks ranging from patient discharge, ICU admission, and mortality. We show that 1) applying our TabText framework enables the generation of high-performing and simple machine learning baseline models with minimal data pre-processing, and 2) augmenting pre-processed tabular data with TabText representations improves the average and worst-case AUC performance of standard machine learning models by as much as 6%

arXiv.org e-Print Archive

Over-MAP: Structural Attention Mechanism and Automated Semantic Segmentation Ensembled for Uncertainty Prediction

Author: Boussioux Léonard
Kantor Charles,
Rauby Brice
Talbot Hugues
Publication venue: HAL CCSD
Publication date: 02/02/2021
Field of study

International audienceBoth theoretical and practical problems in deep learning classification require solutions for assessing uncertainty prediction but current state-of-the-art methods in this area are computationally expensive. In this paper, we propose a new confidence measure dubbed Over-MAP that utilizes a measure of overlap between structural attention mechanisms and segmentation methods, that is of particular interest in accurate fine-grained contexts. We show that this classification confidence increases with the degree of overlap. The associated confidence and identification tools are conceptually simple, efficient, and of high practical interest as they allow for weeding out misleading examples in training data. Our measure is currently deployed in the real-world on widely used platforms to annotate large-scale data efficiently

INRIA a CCSD electronic archive server

Gradient-Based Localization and Spatial Attention for Confidence Measure in Fine-Grained Recognition using Deep Neural Networks

Author: Boussioux Léonard
Kantor Charles A.
Rauby Brice
Talbot Hugues
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 18/05/2021
Field of study

Both theoretical and practical problems in deep learning classification benefit from assessing uncertainty prediction. In addition, current state-of-the-art methods in this area are computationally expensive: for example,~\cite{loquercio2020general} is a general method for uncertainty estimation in deep learning that relies on Monte-Carlo sampling. We propose a new, efficient confidence measure later dubbed Over-MAP that utilizes a measure of overlap between structural attention mechanisms and segmentation methods. It does not rely on sampling or retraining. We show that the classification confidence increases with the degree of overlap. The associated confidence and identification tools are conceptually simple, efficient and of high practical interest as they allow for weeding out misleading examples in training data. Our measure is currently deployed in the real-world on widely used platforms to annotate large-scale data efficiently

Association for the Advancement of Artificial Intelligence: AAAI Publications

Over-MAP: Structural Attention Mechanism and Automated Semantic Segmentation Ensembled for Uncertainty Prediction

Author: Boussioux Léonard
Kantor Charles,
Rauby Brice
Talbot Hugues
Publication venue: HAL CCSD
Publication date: 02/02/2021
Field of study

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Geo-Spatiotemporal Features and Shape-Based Prior Knowledge for Fine-grained Imbalanced Data Classification

Author: Boussioux Léonard
Jehanno Emmanuel
Kantor Charles,
Luccioni Alexandra
Rauby Brice
Rolnick David
Skreta Marta
Talbot Hugues
Publication venue: HAL CCSD
Publication date: 07/01/2021
Field of study

Copyright by the authors. All rights reserved to authors only. Correspondence to: ckantor (at) stanford [dot] eduInternational audienceFine-grained classification aims at distinguishing between items with similar global perception and patterns, but that differ by minute details. Our primary challenges come from both small inter-class variations and large intra-class variations. In this article, we propose to combine several innovations to improve fine-grained classification within the use-case of wildlife, which is of practical interest for experts. We utilize geo-spatiotemporal data to enrich the picture information and further improve the performance. We also investigate state-of-the-art methods for handling the imbalanced data issue

HAL-CentraleSupelec

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-Rennes 1